Time-dependent information transmission in a model regulatory circuit 
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Many biological regulatory systems process signals out of steady state and respond with a physi- 
ological delay. A simple model of regulation which respects these features shows how the ability of 
a delayed output to transmit information is limited: at short times by the timescale of the dynamic 
input, at long times by that of the dynamic output. We find that topologies of maximally infor- 
mative networks correspond to commonly occurring biological circuits linked to stress response and 
that circuits functioning out of steady state may exploit absorbing states to transmit information 
optimally. 

PACS numbers: 



To respond to environmental changes, regulatory bio- 
chemical networks need to transform the molecular sig- 
nals they receive as input into concentrations of response 
molecules. These processes are inherently stochastic, as 
both input and output molecules are often present in 
small numbers. This observation has motivated a number 
of recent works which pose the search for design principles 
as an optimization problem over the network topology 
and reaction rates. Such an approach has demonstrated 
that, within a network that functions at steady state, the 
system can exploit the molecular details of the network 
to transmit information [THB]. Many of the current ap- 
proaches have looked at instantaneous information trans- 
mission [3 [S], or the rate of information transmission 
m [TU]. However, regulatory response is often at a 
delay relative to input signaling, e.g., due to transcrip- 
tion and translation processes . Examples include the 
chemotactic response of bacteria [T2] or amoeba [T3] to 
nutrients or conversely to antibiotics [14j . Optimal de- 
sign therefore entails maximizing information transmis- 
sion between the input at a given time and the output at 
a later time. 

In this paper we find optimal circuit designs to trans- 
mit information in situations where the response is mea- 
sured at a later time. Biochemical regulatory networks 
can be very complex and many of their molecular details 
can have an effect on their information processing func- 
tions. Here we focus on a simplified model consisting of 
only two binary components: the input z and the output 
X, which switch randomly between two discrete states, 
— 1 and -|-1, in continuous time (see Fig. [l] for an ex- 
ample time series illustrating the 8 associated transition 
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rates). This can model, e.g., the expression state of two 
genes, or a gene and a protein, each of which can be up- 
{z,x= + l) or down-regulated {z,x= — l). Already within 
this simple model, our approach uncovers topologies that 
correspond to frequently observed biological circuits [T51 - 
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FIG. 1: Time evolution of random variable z{t), which 
models a parent gene or protein transitioning from/to a 
down-regulated state (—1) to/from an up-regulated state 
(-1-1), with rates {um,Up}/{dm,dp}, respectively. Random 
variable x{t) models activation (-1-1) or deactivation(— 1) of a 
child gene (or protein): it is regulated by z, with which it 
aligns ('activation', or up-regulation) with rates or rp or 
anti-aligns ('repression', or down-regulation) with rates Sm 
or Sp. The subscripts m and p in the rates account for the 
state of the other variable, that is —1 and -|-1, respectively. 

Our system at any time is fully described by a four- 
state probability distribution p{y), where y={x,z) g 
{(— , — ), (— , +), (+, — ), (+, +)} is a joint variable for the 
output and the input. The temporal evolution of the con- 
ditional probability p{y'\y) to find the system in state y' 
at time t given state y at t—0 is given by a continuous- 
time master equation dtp= — Cp, where £ is a 4 x 4 
transition matrix set by the rates of switching between 
the states (defined in Fig. [l). In p{y'\y) and in the rest 
of the paper, primed variab es refer to the system state 
at time t, and unprimed variables to the initial time 0. 
The solution of the master equation can formally be writ- 
ten as p=e~*^ and is conveniently expressed in terms of 
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its expansion in left and right eigenvalues of C (see Ap- 
pendix |A]) . In particular, the (normalized) right eigen- 
vector corresponding the null eigenvalue is the stationary 
state poo{y')^yvaYt^ooP{y'\y)- 

The information between the input z at time and the 
output x' at a later time t is defined as 

J[xt, zo] = y^p{x', z) log2 ' ^'^ , (1) 
^ p{x')pq{z) 

where the joint probability distribution p{x',z) can be 
readily derived from the conditional distribution p(jj'\y) 
and the initial distribution po{y) (See Appendix [A| . The 
case in which the network must respond repeatedly to 
an input which is in statistical steady state differs from 
that in which the network must respond only once to 
its input, e.g. by producing an enzyme to metabolize a 
newly available sugar [11] . While in the former the input 
signal is determined by C (i.e. po = Poo), in the latter pq 
may also be optimized, as the cell may control the form 
of the input it presents to the regulatory network. We 
consider both optimization cases in the context of our 
model. 

A trivial way to maximize information transmission is 
for the input to change infinitely slowly relative to a fixed 
delay time t (i.e. Ux = ^ 0, — 0, > 0; x = p,m 
following the notation of Fig. [T]), such that for any finite 
t, the output yields a noiseless readout of the input, i.e. 
p{xt = zq) ^ 1. In short, nothing happens. To constrain 
our optimization such that information is transduced on a 
timescale set by the system's own dynamics, we optimize 
the information 

I{t) ^ I[xt^r/\,zo], (2) 

where the rate A is given by the smallest non-zero eigen- 
value of £; I(t) implicitly depends on the rates appearing 
in C and, if it is not set by po (y) = poo (j/) , on the initial 
distribution po{y). We find network architectures that 
transmit information optimally when the response time t 
is comparable to the relaxation timescale of the system, 
i.e. we maximize X in Eq. ^ at fixed t — \t over the 
rates p3] . 

To gain intuition, we start by considering the simplest 
possible case (model A ) where z up-regulates x perfectly, 
symmetrically, and without feedback. In this case, x 
is slaved to z and switches only if a; 7^ z (see Fig. [l] 
with Sm=Sp=0, Um=Up=dp=dm=u and rp=rm=r) . This 
leaves us with only two timescales, related to u and r. 

In the steady-state case (poiy) — Poo(y)), the mutual 
information can be computed explicitly and is related to 
the entropy of an effective two-state spin variable: 

I[xt,zo] = ^'^^log2(l + M) + ^^-'^log2(l-/i), (3) 

where the "effective magnetization" p = 2[p{x'~z, z) — 
p{x'^z, z)] is {e-'^'^^r{r+2u)-e-''Hur)/ {{r-2u){r+2u)) 
(see Appendix [d|) 2T. For long times, we see that /i— ?>0 



and I[xt, zo]—>-0, as expected. As previously shown for a 
different model [18j , the mutual information for a system 
initially in steady state has a maximum for a non-zero 
delay t*—log (^^7^) / (2u — r), which is determined by the 
interplay of the two timescales introduced above. Here, 
we are not interested in finding the timescales over which 
information transmission is maximal, but rather the rates 
that maximize I(r) at fixed rescaled time [25| r = Xt, 
where A = min{2u, r} for model A (see Fig. [s]). 

The optimal information I* (r) and parameters 
(u* (t) , r* (t)) for the simplest model where z activates 
X are plotted in Fig. [2] A [35]. We see a clear crossover 
in terms of the parameter u* that regulates the state 
of z (see dashed vertical line in Fig. [2] A ) : it initially 
increases in time and then plateaus at u*=0.5. This 
crossover results from the fact that for r>2u the relax- 
ation time is dominated by the rate at which the input 
changes {X—2u), whereas for r<2u the rate at which the 
output changes fixes relaxation times (A=r). Informa- 
tion transmission is dominated, over short timescales, 
by the faster rate r. Over long timescales, optimality 
is achieved by matching the characteristic times of the 
two processes {r—2u). The degeneracy of the two small- 
est non-zero eigenvalues is a non-trivial generic feature 
of optimal networks that we also find in more complex 
models (see below). The dynamics of model A can be 
summarized by the network topology and regulatory cir- 
cuits shown in Fig. [3] A . 

We can generalize model A by allowing z to down- 
regulate X as well — that is, to allow Sm=Sp7^0 (model 
B). As in model A , we forbid feedback from a; to z, hence 
the transition rates for z do not depend on the state of 
X (i.e. Um=Up=u and dp=dm=d). Optimization yields 
only solutions coinciding with that of model A , or with 
its symmetric counterpart (wherein z perfectly down- 
regulates X instead of perfectly up-regulating: rp—Tm—Q 
and Sp=Sm—s). Intuitively, in order for information to 
be transduced between x and z, they either align or anti- 
align, resulting in the common simple activator or repres- 
sor element [19]. Note that the same topological struc- 
ture is found at all timescales r. This is to be contrasted 
with previous studies [U [5D] which, taking into account 
the molecular cost paid by producing higher copy number 
(e.g., creating more proteins), have found small discrep- 
ancies between the information transmitted in the two 
cases of up-regulation and down-regulation. Since our 
model does not explicitly account for protein copies, we 
do not observe such a difference. 

Recent studies [BJ [ini HI] have pointed to the impor- 
tant role of feedback in transmitting information, a form 
of which we can consider using the full set of 8 rates in 
Fig. [1] Now the hierarchical relation between z and x 
is broken: both can regulate each other's expression, ei- 
ther by down- or up-regulation. Examples of maximally 
informative topologies for all possible time delays t are 
illustrated in Fig. [3| C and reveal a "push-pull" network 
- one gene (or protein) up-regulates the other, which in 
turn down-regulates the first gene/protein. Again, due to 
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FIG. 2: Optimal parameters and transmitted information 
T* for the activator and feedback models in steady 
state(Aand C) and relaxing the steady state assumption 
(A and (7), as functions of the rescaled time r. In panels 
C and C , parameters in square brackets refer to alternative 
optimal topologies (see right diagrams in Fig. HCand C). 
Subscripts m, p are omitted when Xm=Xp (with x=u, d, r, s) 
Results for models Band Bare shown in Appendix G 



the symmetry of the problem, we can flip either x or z and 
we find 4 equally informative solutions (two of which are 
shown in Fig.|3]C ), associated with different sets of the 8 
rates being driven to zero by the optimization procedure. 
The information transmission is again controlled by the 
relaxation time, which switches from being dominated by 
the input timescale at small times, to being dominated 
by the output timescale at longer times (see dashed ver- 
tical line in Fig. [2]C ). We find that the optimal value of 
u now plateaus at 3 — 2-\/2 for long times: this value is 
set by competition with the r rate, by matching the two 
smallest non-zero eigenvalues in order to avoid oscillatory 
solutions (For u > u* = 3 — 2^/2, the eigenvalues become 
imaginary describing oscillations; see Appendix |E] for a 
more detailed derivation). Push-pull networks can oscil- 
late [22] [37] , thwarting optimal information transmission 
by decorrelating the system; the optimal parameters in- 
stead avoid oscillations, such that one of the regulatory 
timescales is much less than the other. This increases the 
relaxation time of the system and the circuit in model 
C can transmit more information at a fixed relaxation 
time than the optimal circuit without feedback (Fig. [5]) . 
Feedback leads to a rotational directionality among the 
transitions (cf. Fig. |3]), in which the state never 'flips 
back', enhancing the transduced information. 

Such push-pull circuits are very common in biol- 



ogy, from microbes [T5] to humans ([TB] and references 
therein) as a source of oscillations [22] and pulse re- 
sponses [IS] (see discussion below). 




z, (+,+)-7-(+r) z 

^'(-,+) (-,-) ^'(-,+) (-,-) 

FIG. 3: Optimal topologies and dynamics for the activator 
and feedback models in steady state {A and C ) and relaxing 
the steady state assumption (yland C). In the two feedback 
models (C and C ) the optimal topology is a "push-pull" : 
one gene (or protein) activates the other, which in turn 
represses the first gene/protein (as we can see, the roles of z 
and X are interchangeable). The dashed line in (7 means that 
the feedback exists only until the absorbing state is reached. 
The topologies of these optimal networks were found by 
inspecting the optimal rates manually. Results for models 
B and B are shown in Appendix [g] 

Having discussed optimal delayed information trans- 
mission of repeated readouts in the stationary state, 
we now consider regulatory networks that optimize a 
one time response to an external signal (e.g. by pro- 
ducing an enzyme to metabolize a nutrient that ap- 
peared in the environment). Speciflcally, we consider 
the same three models studied above {A, B, C), but 
now optimize simultaneously not only on the rates but 
also on the initial probability distribution po{y) and re- 
fer to the associated models as A, B and C , respec- 
tively. We start with model A , which enjoys an up-down 
symmetry and suggests parameterizing the initial distri- 
bution po{y)=Po{x, z) via po(+, +)=Po(-, -)={!+ / 4, 
and po(+, -)=Po(-, +)=(l-^o)/4_[21]- The mutual in- 
formation is still given by Eq. ^ with ii=iJLQe~^*^ + 
_j_(^^-2ut _ £-''*). For i=0, the highest information 
is attained for fi—iiQ— ± 1, when z and x are perfectly 
aligned/anti-aligned. Moreover, ^ and I[xt,zo] decrease 
exponentially with time t. Therefore, unlike in model A , 
the information transmission does not improve by mak- 
ing a delayed readout. 

After performing the optimization, we find that for each 
timescale r the optimal initial state concentrates on the 
states (-l-,-l-) and (—,—), so that I[xo, zo]=l. The ab- 
sence of a maximum at i* > in /[a;t,0o] for optimal 
initial states (model A) suggests that, at odds with the 
stationary case (model A ) , the mechanism for informa- 
tion transmission is only governed by the loss of informa- 
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tion about the initial state as it relaxes to steady state. 

When information transmission is maximized over the 
initial distribution in the more general models B and C , 
we find a qualitative difference in design as compared 
to the steady state case (models B and C ) : while the 
optimal topology remains the same, now either one of 
the aligned or non-aligned states becomes an absorbing 
state, e.g., Poo(y')='^y',(+,+) or Poo(y')='5y',(-,+) i° ^^^^ 
examples in Fig. [sp [29] ■ As above, symmetry provides 
a number of optimal networks related by permutations 
(see Fig. [7]b and [s]); the rates of the optimal networks 
are shown for a case of z up-regulating [down-regulating] 
X in Fig. [e] Band Fig.jlp. 

The occurrence of an absorbing state, with a nearly- 
equal initial distribution over the initial and final states, 
limits the accessible states and leads to the optimal topol- 
ogy for a one time response. In the absence of feedback 
(model B , e.g. receptor activation in a complex path- 
way), when the system, initially in the inactive state 
(x, z) = (— , — ) is presented with a signal (x, z) — > (— , +), 
it switches on a response {x, 

z) -> (+,+) (see Fig.|7|B)- 
However, in the presence of feedback (model C , e.g. a 
nutrient activating the production of an enzyme for its 
uptake, amino acid biosynthesis) the optimal dynam- 
ics includes "feedback inhibition", in which the output 
switches off the input [TT] [30] . As in model C , feedback 
imposes an order to the visited states, with a smaller rate 
for z transitions than for x transitions, and the initial dis- 
tribution admits only two initial states ((— , +), (— , — ) in 
Fig. [spleft), so that /[xo,2:o] = as in model A. In- 
specting Fig. [spleft, and recalling that the rate for z 
transitions is small, we see that Xt = + could only imply 
zq = -t-; similarly,a;( = — means the system started in the 
absorbing state (—,—). In the biological example above, 
this corresponds to the enzyme switching off (— , — ) until 
a new source of sugar appears. 
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FIG. 4: Summary of optimization results: for each of the 
six models, the hull curve of the maximized information 
T{t) is plotted versus r. The information decay is slower 
when feedback is present (model C ) and when the system is 
initially in an optimal state (model A , B , C). 

In Fig. [4] we plot a comparison of all the cases possi- 



ble within the two-state model. As the model generality 
increases (from Ato C ) , so does the number of param- 
eters; accordingly, the information capacity of the sys- 
tem also increases. We see that the introduction of feed- 
back in model C does not play a role in increasing infor- 
mation transmission for small r (input-switching dom- 
inated regime). However, the information gain coming 
from feedback is substantial for long time delays r be- 
tween the input and output readout (output-switching 
dominated regime). Information transmission between 
the input and the output can be improved beyond that 
achieved in the steady state {A , B and C ) if the system 
is pushed out of equilibrium in specific ways {A , B and 
C ) to respond to one time signals. The information gain 
is particularly significant at fast time-scales. 

Push-pull circuits exists in many cells, ranging from 
bacterial (heat-shock response [T5] ) to mammalian (I-kB- 
NFkB circuit [17] in animal stress response, p53-Mdm2 
network involved in DNA damage response [16j), and of- 
ten combine a slow (transcriptional regulation) with a 
fast (protein-protein interaction) component, similarly to 
the design of our optimal architectures, resulting in pulse 
like responses to stress signals. Both in the steady state 
and the non-steady state response we find the same net- 
work topology of a push-pull network. The difference 
between the designs is the absorbing state, which results 
in a single pulse response. The following pulse must be 
triggered by a new signal, hinting a digital type of re- 
sponse 16.1. 

This study gives a framework for studying informa- 
tion transmission between two time points in biochemical 
regulatory systems. Such an approach can be extended 
to more realistic models that explicitly account for pro- 
tein concentrations, where costs of protein production 
and degradation can be studied in detail. 

Acknowledgments. We thank William Bialek for 
helpful discussions. AMW is funded by a Marie Curie 
Career Integration Grant. 



Appendix A: Calculating mutual information 

In the main text we calculate the mutual information 
between the input z at time and the output x' at a later 
time t using the temporal evolution of the joint probabil- 
ity distribution p(a;', z) obtained from a master equation. 
In this Appendix we give a detailed derivation of the steps 
of this calculation. 

We define the state y of the system as 

y = (x, z) e {(-, -),(-,+), (+, -), (+, +)}, (Al) 
and the dynamics in terms of transition rate matrix 
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The corresponding master equation is 
dp 



dt 



= -Cp 



(A3) 



for a vector p=p{y)^{p{-,-),p{-,+),p{+,-),pi+,+)) 
(we omit the imphed dependence on time). Primed vari- 
ables (e.g., y') refer to the state of the system at time 
ty^O; unprimed ones refer to the state at t=0. The transi- 
tion probabihty matrix p{y'\y) is a solution of the master 
equation with initial condition 



limp{y'\y) = Sy,^y. 



t-i-O 



(A4) 



It can be written as the {y' ^y) element of the matrix 



e I.e. 



p{y'\y) = [e-*^],^,^ = ^ e-^°S„(y')^^o(y), (A5) 



a=l 



where Aq. (with a = 1, ... ,4) are the four (assumed to 



be distinct for this derivation) eigenvalues of £, and Va 
and are their corresponding orthonormal right and 
left eigenvectors, with components Va{y') and Ua{y): 



CVa 



(A6) 
(A7) 
(A8) 



In particular, if we choose a normalization such that 
ui = (1, 1, 1, 1), the eigenvector wi, corresponding to the 
eigenvalue Ai = 0, is the stationary state Poo{y)- 

We are computing the mutual information between z 
at time and x' at time t, which is a function of t and is 
given by 



I[xt,za] = ^p(a;',z)log2 



p{x')p{z) ■ 



(A9) 



The joint probability p{x' ^ z) is calculated from p{y'\y) as 



p{x' ,z) — y, p(x' , z\y' , y)p{y' , y) using the definition of conditional probabilities 

~ '^y y' Pi^'\v')piMy)p{y' ill) exploiting the conditional independence of x' , z 
— '^y y, p{x'\y')p{z\y)p(y'\y)pQ{y) using the definition of conditional probabilities. 



(AlO) 



Note that the elements of p{x'\y') and p{z\y) are either 
or 1 according to whether, for example, y is consistent 
or inconsistent with z: 



p(z = +|y= (+,+)) = 1 
p(z = +|y =(+,-)) = 



(All) 



joint distribution, as shown in the paragraphs below; we 
present these results in Appendices [D] [E] and [F] 

Numerical computation of the joint distribu- 
tion. For numerical computation in MATLAB, it is use- 
ful to rewrite Eqn. |A10| in terms of matrix operations. 
To that end, we define (note that X and Z are 0—1 
matrices - whose elements are or 1, as per Eqn. All I 



et cetera. Finally, the marginal probabilities p{z) and 
p(x') are given by 

P(^) = ^Pi^'i P(x') = ^P{x', z). 



The numerical computation of the mutual informa- 
tion can now be implemented and the optimal rates 
for systems of various complexity can be found nu- 
merically. In the paragraphs below we present use- 
ful computational details for implementing this calcula- 
tion in MATLAB. We have implemented the optimiza- 
tion procedure in MATLAB and we made the source 
code available via the following public access repository: 
http: / /infodyn. sourceforge.net. 

For certain models we can also make analytical 
progress by exploiting spectral representations of the 



X,,y = p{x'\y') (A12) 

Gy',yo ^ p{y'\yo)^[e~'^]y,^y^ (A13) 

Pyo,y = Po{yo)Syo,y (A14) 
Zy^, = p{z\y). (A15) 

This allows us to write Eqn. |A10| compactly as 

y' vo V 

= [XGPZl,^^ (A17) 

that is how the equation is implemented in MATLAB. 
Analytical calculation of the joint distribution. 

For analytic calculations, it is useful to expand p{y'\y) in 
Eqn. AlO in terms of its spectral representation (Eqn. 
ASI): 
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pix',z) = ^^p{x'\y')p{z\y)e ^°*Wa(y')"a(y)Po(y) 



a>l 



vV' 



where 





a>0 




Poo {x') 


EE Ep(^'ly')Poo(y') 

y' 


(A19) 




= '^p{z\y)pa{y) 

y 


(A20) 




EE J2v{yXp{x'\y') 
y' 


(A21) 


"a 


= ^Ua{y)po{y)p{z\y). 


(A22) 



Writing p{x',z) in this form makes it clear that, if the 
eigenvalues are distinct and thus p{y'\y) is diagonalizable, 
then p{x' , z) factorizes as t — >■ cx) and thus I[xt^ ^o] ~^ 0. 
Also clear is that p{x' ,z) — Poo(x')pq{z) is expressible as 
a sum of time-decaying exponentials. Since p{x'\y') and 
p{z\y) are — 1 matrices, in many cases and Ua can 
be calculated explicitly, as shown below. 



Appendix B: Optimization procedure 

We optimize over the parameters of each model in or- 
der to maximize X(t) = I[xt=T /\, zq], where r is an di- 
mensionless quantity that results from the rescaling pro- 
cedure: 

t — > t • A = r, 

where A is the system's relaxation rate (the smallest 
nonzero eigenvalue of the rate matrix C). The steps of 
the "rescale and optimize" procedure are: 



while 



< T < r„ 



1. optimize X(r; 0) over parameters 9 or parameters 
6 and initial distribution p{y) 

2. save I*,6l*,p* 

3. increment t 
end loop over t 



(A18) 



where 

calculate I(t; 6): 

1. calculate C{0) 

2. calculate A(£) 

3. calculate p(a;, z)— AT exp(— r£/A)PZ, as in 
Eqn. [XTTI 

4. calculate I[p{x, z)] 

return z)] to optimization algorithm 

The results are obtained as hull plots, as presented in 
Fig. [5] for model A . 

model A 




FIG. 5: An explicit construction of the optimal information 
curves presented in the main text , here shown for model A . 
The maximum value T*{t) for each r (red *) is obtained by 
optimizing the rates u and r at each r. The optimal rates 
can be different for each r. The blue continuous curves show 
the whole range of I(t): each of them intersects a 
(ti*,I*(t)) and is computed by using the corresponding 
optimal set of rates. 
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Appendix C: Derivation of mutual information for 
2-bit, symmetric systems 

In this Appendix we derive useful relations for the 
mutual information in the case of 1-bit symmetric sys- 
tems. Consider any arbitrary distribution p{a, b) where 
{a, 6} g {—1,-1-1} and the distribution enjoys symmetry 
under flipping —1 o -|-1. In this case p{+, +)=p{—, —) 
and p{+, —)—p{—, +). For any such distribution the mu- 
tual information greatly simplifies, as exploited in the 
analytic results associated with model A . 

Let us define — 1 < r/ < 1 such that 



(CI) 
(C2) 



(C3) 



=P(-,-) = (l + ?7)/4, 
p(-,+)=p(+,-) = (l-'7)/4. 

From these, we see that 

p(a)=p(5) = l/2 

and 

I[a,b] = ^p{a,b)log2p{a,b)/{p{a)p{b)) 

a,b 

= ^Pia,b)log2Ap{a,b) 

a,b 

= 2p{~, -) log2 4p(-, -) + 2p{~, +) log2 Ap{~, +) 
- ^(1 + '/) log2(l + + ^(1 - V) log2(l - ??)(C4) 

as in the main text , where we replace rj by the appropri- 
ate expression for /i in each model. 
Note also that 



p{a — b,b) — p{a b,b) = 
p(+,+) -pi-,+) = 
1/4(1 + r; - 1 -I- ?;) = r;/2. 



(C5) 



Appendix D: Explicit calculation for model A 

Model A describes a system in which z up-regulates x 
and X is slaved to z. In this case we can diagonalize C 
analytically and calculate the mutual information. We 
set 



* ^ 1; 



* ^rn '^p 0; 



and the initial probability po (y) is set equal to the steady 
state Poo{y)- 

In this case the transition rate matrix C is given by 



C = 



u —u —r 

—u u + r 

u + r —u 

— r — u u 



and its spectrum is 

{Ai = 0, A2 = r, A3 = 2u, A4 = r + 2m}. 



(Dl) 



The relaxation rate of the system is given by 
the first non-zero eigenvalue, which switches from 

Arelaxation = A3 = 2u for Small T tO Arolaxation=A2=r for 

large r. This change in the rates that govern the re- 
laxation times marks the crossover shown in Fig. [2] in the 
main text . 

The right eigenvectors are 



V3 




1 



2(r - 2u) 



1 



2(r + 2u) 



(D2) 



The left eigenvectors are 

'uj = {1,1,1,1), 



/ 1 —u-\-r ~\-u—r 1 

~ V ' u ' u ' 

= (-1,1,-1,1), 



(D3) 



Using the expressions introduced in Appendix [C] we find 
that p{x' , z) is given by the following 2x2 matrix: 



p{x,z) = 



(l+e-^*")r^+2(-2e 



'-fe-^"*)rti-4ti^ (l-e-^"')r2-|-2(+2e-''*-e-2"')rtt-4«^ 

A(r-2u)(r+2u) i(r-2u)(r+2u) 

(l-e-2"')r^+2(+2e-'''-e"^"')rM-4«^ (l+e"^"')r^+2(-2e"'''+e"^"')rji-4u^ 

i(r-2u)(r+2u) A(r-2u){r+2u) 



(D4) 



r 



We can then compute I[xt,ZQ\, which, after some alge- 
braic manipulation, reads: 
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I[xt,zo] = ^ 



+ 



-4e- 



V(r + 2u) 



{r-2u)(r + 2u) 



1 - 



-4e- 



^r{r + 2u) 



{r - 2u){r + 2u) 



logs 



log2 



(r - 2u){r + 2u) 
(r-2u)(r + 2u) 



(D5) 



r 



If we introduce the quantity 
_4e-rt. 



ru - 



-2tti 



r(r + 2u) 



{r-2u){r + 2u) 



(D6) 



we recover Eqn. |3] in the main text . We note that 
for large r the optimal rates are r*—2u*, so that 
Areiaxation=A2=A3. For this spccial casc, the expressions 
above may be evaluated by Taylor expanding about r=2u 
to find 



/i = £-'^72(1 - 2rt). 



(D7) 



We see that for t—Q, I[xt, zo]=l bit and, for long times, 
/i— ?>0 and I[xt, zq\^0. 

Moreover, on all time-scales we find that in the stationary 
state p(x'=z )/p(x '— — z)=^{r/u + 1) > 1. 

=0 and using the appropri- 



From Eqn. D5 taking 



5/1 . 
It I**- 



ate expression tor ^, we are able to find the optimal time 
lag for model A as 

'2u^ 



t* = log 



1 



2r 



I (2u — r) for r ^ 2w, 



(D8) 



or t* — — , for r — 2u, as presented in the main text . 
2r 



Appendix E: Extension to model C 

In model C we allow for all the eight rates to be nonzero 
and different from each other. Via the symmetries of the 



system (e.g., relabeling the nodes and their rates), there 
are many parameter settings which perform identically. 
Qualitatively, these topologies may all be described as 
either 

1. z activates x, which in turn represses z, or 

2. z represses x, which in turn activates z. 

As an example, we consider a topology of the first type, 
where the rates are the following: 



Up 



• Sp — SjYi 0. 

The spectrum of C is 




(El) 



For small r the relaxation time is given by A3 . For Um = 
(3 — 2\/2)rjn, A3 = A4. Since we constrain the timescales 
such that max{u,„, dp, Up, dm, r-m, ?'p} = 1, for large t 
the optimal value of and Um are r*^ = 1 and = 
3 — 2^/2 < 1, as can be seen from Fig. ■ The right 
eigenvectors are 



Vi 



2{r„ 



,V2 



2(r„,+Mm) 



/ +Urr 
-Urr 

\ +U„ 



(E2) 



r'^—6r.,riUm-\-u^-n 



V3 



,f4 







— 6r„i tim+u^^ 




— 6r„ 










— 6r„ 




— u^ 






— 6r„ 




— r777 +^777 -f- 






4\A^ 


— 6r„ 





(E3) 
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The left eigenvectors are 



= (1,1, 1,1), 



(1,- 




' 


(-1, 


+r„ 








(-1, 


+r„ 






2r„ 



2r„ 



1), 



The mutual information is given by Eqn. [3] in the main text , with /i given by 



+ sinh(t^ + ^ 6r,„u„j) 



/X = cosh(t^ + - 6r„jU„) 



(E4) 



(E5) 



For i=0, Ho—— ^ and I[xt, zo]~l bit, as can be 

seen from Fig. [2] C in the main text . We can define 



and, noting that 

sinhx* 
lim — 1, 

x'^Q X* 

we obtain for large r the optimal 



(E6) 



(E7) 



(E8) 



Appendix F: Model A 

Model A is the same as model A , except for the fact 
that the initial probability poiy) is not given by the 
steady state but is optimized. In order to optimize 
I[xt,Zo], one demands that the entropy S[p{z)] is equal 
to 1 [3T]. This, together with the symmetry in —1 +1 
for X and z, constrains the form of the initial distribution 
to be parameterized as 



Po{x,z) 



1 



Mo 



1 



Mo 



1 



Mo 



1 



Mo 



(Fl) 



The probability z) then reads 



r+e 



p{x',z) 



4(r-2«) 4(r-2M) 
4(r-2«) 4(r-2M) 



and we can explicitly compute the mutual information: 



(F2) 



I[xt,zo] = ^ ( 1 + Moe" 



1 - Moe" 



r — 2u 



(e-2«* _ e-'-*) ) log2 



r — 2u 



l0g2 



1 + Moe" 



1 - Moe 



r — 2u 



r — 2u 



-2ut _ g-l•t^ 



(F3) 



The above equation can again be rewritten as in Eqn. [3] 
of the paper if we introduce 



M = Moe + 



2u 



— 2ut g^'''^ \ 



(F4) 



For i=0, M=Mo the information is maximized by 
M=Mo= ± 1- 



Appendix G: Models B and B 

Models B and B are extensions of model A and A , re- 
spectively, where we allow z to also be a repressor of x 



(sm 7^ 0, Sp 7^ 0, and we no longer demand = rm). 
As in models A and A , we do not allow feedback from 
X to z, meaning that the transition rates for z do not 
depend on the state of x (i.e. Um—Up=u and dp~dm=d). 
The results for the optimal parameters and topologies of 
these models are very similar to models A and A and are 
not shown in the main text . We plot the correspond- 
ing results in Fig. [g] here. We note that in model B we 
have absorbing states, similar to those of model C shown 
in the main text . As in model C , the symmetries of the 
problem allow many equivalent (under permutation of la- 
bels) parameter settings: we show all of them in Fig. [t] 
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However, unlike in model C where all states are visited, 
in each optimal setting of model B one state (shadowed 
in Fig. [7]) is never visited. 



B 
1.5 

1 

0.5 



,1- T* 
W -i. 













rls^] 













B 
1.5 



-I' 

- 

■ [Po(-^+)l 

p,)(-.-) [p;(+,-)] 




(x,z) 
z 



C 



(+,+)-7-(+,-) 



(+,+) 



p 

(- ,+) 
(+,+) 

(- ,+) 



(- ,-) 
(+,-) 

■(-,-) 



X 



X'-' 



(- ,+) 
(+,+) 

H 



' (+,-) 



(- ,-) 
(+,-) 

(-V)^(-I) 



FIG. 8: Optimal topologies and dynamics for model C , 
where the steady state assumption is relaxed and feedback is 
present. The topologies of these optimal networks were 
found by inspecting the optimal rates manually. The 
optimal initial distribution is split slightly unevenly between 
the beginning and end state. 



FIG. 6: Example of optimal parameters, along with 
transmitted information X* , for models B and B . 
Parameters in square brackets refer to alternative optimal 
topologies where z down-regulates x instead of up-regulating 
it (see Fig. [7] -Band B). Subscripts m,p are omitted when 
Xm=Xp (with x=u, d, r, s). In model B the optimal rates do 
not change with r and the optimal initial distribution is 
split slightly unevenly between the beginning and end state. 



(x,z) 



u=d 



u=d 



z (+,+) — (+,-) z (+,+) 



B 



z 

X 

z 

I 

X 



(-V)^(-,-) 

d i^- 

(- ,+)^(- ,-) 



(+,+) 



1 H 



z 

1 

X 



(- ,+)- 

(+,+)■ 

(- ,+)- 



u 

u 
d 



+, 



+ 



-) 

-) 
-) 

-) 



FIG. 7: Full set of optimal topologies and dynamics for 
models B and B . The topologies of these optimal networks 
were found by inspecting the optimal rates manually. In 
model B some states (shown in gray in the figure) are never 
visited. 



Appendix H: Model C 



Model Cis the same as model 
that the steady state assumption 
system is optimized also over the 
in models A and 13 ) . The results 
in the main text ; here, we simply 
maximally informative topologies 
each of them can be labeled as 
and features an absorbing state. 



C, except for the fact 
is now relaxed and the 
initial distribution (as 
are discussed in detail 
show in Fig. [8] the four 
that have been found: 
a "push-pull" network 



Appendix I: Information decay in a system with 
fixed parameters. 

In the main text , for each relaxation time, we optimize 
the parameters of the system in order to get a maximally 
informative network. However, real regulatory networks 
have a fixed set of parameters for all times. In Fig. [9] 
we thus plot the mutual information between the initial 
input zq and a delayed output function of the de- 

lay t, for different sets of fixed parameters. In particular 
we consider models A , C and A . We see that informa- 
tion decays more slowly if the system is initially in steady 
state, and therefore continues being in steady state (mod- 
els A and C ) . However, being out of steady states (model 
A ) allows for larger initial information transmission for 
small times. 



model A, 


u= 


0.1648 


model A, 


u= 


0.2552 


model A, 


u= 


0.5 


model C, 


u= 


3 - 2^/2 


model A, 


u= 


0.5 




FIG. 9: The mutual information between the initial input zq 
and a delayed output xt as a function of the delay t for 
different sets of fixed parameters. A comparison of model 
A (black lines), model C (red line) and model A (blue line). 
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